1 Introduction

2 Background

3 Descriptive analysis

3.1 Overview

To get a good sense on how the data is distributed, each state in the USA is plotted with different shades of purple to signify the rate of vehicle fatalities per 100k people. The reason that the rate of fatalities was chosen instead of the number itself is due to the difference in population in each state. In the map below, while California has an average of more than 5000 fatalities per year, the number is relatively small compared to its population size. In other words, for every 10k people in the state, only 2 people were involved in fatal vehicle accidents.

On the other hand, Texas seemed to have the highest average rate of fatalities per 10k people. With an average population of 16.3 million people between 1982 and 1988, this meant that for every 10k people in Texas, 3.65 people were involved in fatal accidents, which was close to double the amount of California!

3.2 Exploratory Data Analysis

As the focus of this data analysis is to find out whether laws that were implemented to tackle drunk driving related fatalities, only a subset of the variables from the Fatalities dataset were used. In particular, response variables that were alcohol-related such as the total number of fatalities and alcohol fatalies were examined, while predictor variables that are closely related to alcohol-consumption-driven laws were also analyzed.

3.2.1 Univariate Analysis

We start off the exploratory data analysis procedure by individually examining the predictor and response variables. The goal here is to understand how the data is distributed, which helps set an expectation on how the variables correlate with each other, or whether model assumptions will be met.

3.2.1.1 Predictor Variables

As we are looking into how alcohol-consumption-driven laws impact the rate of alcohol-related fatalities, some variables of interest include spirit consumptions, beer tax, proportion of the population living in dry counties, minimum drinking age, and the mandatory punishments implemented by each state throughout the 7 years.

The plot below shows the top 5 states in terms of average spirits consumptions, average beer tax, and average proportion of population living in dry counties between 1982 and 1988. Other than North Carolina (NC) being in the top 5 states for beer tax and containing large proportion of dry residents, it can be seen that there is no other “standout” state below, ie. there’s no state present in more than one of the top 5 categories.

On the other hand, it can be seen that there has been an increasing implementation/tightening of laws throughout the 7 years. The most obvious changes here is the number of states that increased the minimum drinking age. In 1982, almost half the country had set their minimum drinking age to be less than 21, and yet most of the states have opted for 21 to be the minimum drinkage 7 years later.

Additionally, there seem to be a slight increasing trend in the number of states that implement testings (breath test) and punishments (mandatory jail sentence and mandatory community services) between 1982 and 1988. We need to note, however, that the number of states implementing mandatory jail sentences decreased very slightly from 1986 to 1988. This raises the question of whether a mandatory jail sentence is effective in combating the issue of drunk driving. Such questions will be addressed after fitting a suitable model.

3.2.1.2 Response Variables

After observing the trend of the implemented laws, the focus is now switched to analyzing the distribution of fatalities and alcohol fatalities across the country. A quick look at the top two histograms below might suggest that a large portion of states have less than 1000 fatalities per year, and less than 500 alcohol related fatalities per year. However, each state’s population need to be taken into account in this case due to the significant variation in population sizes across the country. Our new histograms (bottom two) tells us that the distributions of the data can be approximated as normal. Since the goal of this analysis is to discuss the effects of alcohal-related law implementations, the alcohol-related fatalities to overall fatalities ratio becomes our main topic of interest.

The plot below shows the proportion of alcohol-related fatalities of each state throughout the years. It can be seen that the proportion of alcohol-related fatalities have been either constant or decreasing in those 7 years. This is more prevalent in states such as Kansas (KS), North Dakota (ND), and Arkansas(AR). However, there is one exception to this trend. In the line plot below, we observe that Mississipi had a significant increase in the proportion of alcohol fatalities from 1983 to 1988.

3.2.2 Multivariate Analysis

4 Inferential analysis

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## unemp      -0.0567273  0.0071020 -7.9875 3.715e-14 ***
## jailyes    -0.0075778  0.1245342 -0.0608 0.9515232    
## drinkage19 -0.0421462  0.0662733 -0.6359 0.5253372    
## drinkage20 -0.0223700  0.0726344 -0.3080 0.7583287    
## drinkage21  0.0520453  0.0696946  0.7468 0.4558406    
## beertax    -0.5877703  0.1716524 -3.4242 0.0007099 ***
## spirits     0.6944051  0.0802470  8.6533 4.154e-16 ***
## dry         0.0279848  0.0134966  2.0735 0.0390522 *  
## serviceyes  0.0361896  0.1439526  0.2514 0.8016915    
## breathyes   0.0206776  0.0506726  0.4081 0.6835426    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = fatal_r ~ unemp + jail + drinkage + beertax + spirits + 
##     dry + service + breath, data = data, model = "within", index = c("state", 
##     "year"))
## 
## Unbalanced Panel: n = 48, T = 6-7, N = 335
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.4718637 -0.0788562  0.0018826  0.0794093  0.6230710 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## unemp      -0.0567273  0.0071020 -7.9875 3.715e-14 ***
## jailyes    -0.0075778  0.1245342 -0.0608 0.9515232    
## drinkage19 -0.0421462  0.0662733 -0.6359 0.5253372    
## drinkage20 -0.0223700  0.0726344 -0.3080 0.7583287    
## drinkage21  0.0520453  0.0696946  0.7468 0.4558406    
## beertax    -0.5877703  0.1716524 -3.4242 0.0007099 ***
## spirits     0.6944051  0.0802470  8.6533 4.154e-16 ***
## dry         0.0279848  0.0134966  2.0735 0.0390522 *  
## serviceyes  0.0361896  0.1439526  0.2514 0.8016915    
## breathyes   0.0206776  0.0506726  0.4081 0.6835426    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.785
## Residual Sum of Squares: 7.3356
## R-Squared:      0.31982
## Adj. R-Squared: 0.17986
## F-statistic: 13.0247 on 10 and 277 DF, p-value: < 2.22e-16
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## unemp      -0.0133817  0.0055587 -2.4073   0.01672 *  
## jailyes     0.2039941  0.0974724  2.0928   0.03727 *  
## drinkage19  0.0370111  0.0518718  0.7135   0.47613    
## drinkage20  0.0122483  0.0568506  0.2154   0.82958    
## drinkage21  0.0302554  0.0545496  0.5546   0.57959    
## beertax    -0.2039728  0.1343515 -1.5182   0.13010    
## spirits     0.3989548  0.0628089  6.3519 8.692e-10 ***
## dry         0.0048993  0.0105637  0.4638   0.64316    
## serviceyes -0.1983558  0.1126710 -1.7605   0.07943 .  
## breathyes  -0.0152399  0.0396612 -0.3843   0.70109    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = afatal_r ~ unemp + jail + drinkage + beertax + 
##     spirits + dry + service + breath, data = data, model = "within", 
##     index = c("state", "year"))
## 
## Unbalanced Panel: n = 48, T = 6-7, N = 335
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.9489734 -0.0571550 -0.0073183  0.0504910  0.4753808 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## unemp      -0.0133817  0.0055587 -2.4073   0.01672 *  
## jailyes     0.2039941  0.0974724  2.0928   0.03727 *  
## drinkage19  0.0370111  0.0518718  0.7135   0.47613    
## drinkage20  0.0122483  0.0568506  0.2154   0.82958    
## drinkage21  0.0302554  0.0545496  0.5546   0.57959    
## beertax    -0.2039728  0.1343515 -1.5182   0.13010    
## spirits     0.3989548  0.0628089  6.3519 8.692e-10 ***
## dry         0.0048993  0.0105637  0.4638   0.64316    
## serviceyes -0.1983558  0.1126710 -1.7605   0.07943 .  
## breathyes  -0.0152399  0.0396612 -0.3843   0.70109    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.3854
## Residual Sum of Squares: 4.4939
## R-Squared:      0.16555
## Adj. R-Squared: -0.0061595
## F-statistic: 5.49553 on 10 and 277 DF, p-value: 1.8671e-07

5 Sensitivity analysis

6 Causal interpretation

In panel data, measures were conducted on the same entity (state) repeatedly at different time points (years). Also, the fixedd effect model adopted in the current project accounted for unobserved, entity-specific, time-invariant confounders. Given these features, it might seem reasonble to make causal inference for significant predictors on the response variable. However, a fixed effect model requires strong exogeneity assumptions in order to make causal inference, including: (a) no unobserved time-varying confounders; (b) past outcomes do not directly affect current outcome; (c) past treatments do not directly affet current outcome; (d) past outcome do not directly affect current treatment (reverse causation).

Assumption (a) is hard to verify and also difficult to relax under the fixed effect model. Thus we assumed no time-varying covariates were omitted from the current model and see whether the other assumptions were violated in the current model and how they can be relaxed. Assumption (b) can be relaxed without interfering with the causal inference between current treatment and current outcome so long as we condition on past treatment, and assuming past outcome does no directly affect current treatment. To relax assumption (c), we could add a small number of lagged treatment effect into the model (e.g. treatment from the year before). Last, for assumption (d): no reverse causation, a popular approach to relax it is to include instrumental variables for endogenous predictors. Endogenous predictors are those included in the model but are correlated with the error term. This could happen when the response variable can reversely cause the predictor, or some omitted confounders can affect both dependent and independent variables. Instrumental variables were those not included in the model, associated with the endogenous predictor, but not associated with the unobserved confounders.

Some previous studies on the traffic policy environment and fatality rate suggested using alcohol regulations as instrumental variables for alcohol consumption when investigating the effect of alcohol consumption on traffic accidents fatality. Such alcohol regulations can only affect traffic accident fatality through alcohol consumption, and there were previous studies showing significant effect of such regulations on alcohol consumption. In the current dataset, the covariate related to alcohol consumption is “spirits”, and alcohol regulations include “drinkage” (minimum drinking age), and “beertax”. To verify the approporiateness of drinkage and beertax as instrumental variables for spirits consumption, under-identification, weak instrument, and over-identification need to be tested. To test for under-identification is to test the null hypothesis that spirits and beertax or drink age are irrelevant. This could be done through simple t-test and likelihood ratio test. The result showed that beertax was not associated with spirits consumption (Pr(>F) = 0.1012), but drinkage had significant effect (Pr(>F) <0.0001). Thus, beertax failed the under-idetification test. Weak instrument was tested by calculating Cragg-Donald F statistic and comparing it against Stock and Yogo critical values. The null hypothesis (the instrumental variables are weak) can be rejected if the Crgg-Donald F statistic is greater than the criticla value. The Cragg-Donald F statistic calculated for drinkage was 10.59, and the critical value was 22.3, thus we failed to reject the null at significance level 0.05. As a result, we could not find appropriate instrumental variables for spirits in the current dataset. If more measures are availble, such as other alcohol regulations and other alcohol consumption information, we might be able to find more suitable instrument variables.

7 Discussion

Acknowledgement

Reference

Session info

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] plm_2.4-0       gplots_3.1.1    panelr_0.7.5    lme4_1.1-25    
##  [5] Matrix_1.2-18   GGally_2.0.0    forcats_0.5.0   stringr_1.4.0  
##  [9] dplyr_1.0.2     purrr_0.3.4     readr_1.4.0     tidyr_1.1.2    
## [13] tibble_3.0.4    tidyverse_1.3.0 plotly_4.9.2.1  ggplot2_3.3.2  
## [17] AER_1.2-9       survival_3.2-7  sandwich_3.0-0  lmtest_0.9-38  
## [21] zoo_1.8-8       car_3.0-10      carData_3.0-4  
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-150       bitops_1.0-6       fs_1.5.0           lubridate_1.7.9.2 
##  [5] RColorBrewer_1.1-2 httr_1.4.2         tools_4.0.3        backports_1.2.0   
##  [9] R6_2.5.0           KernSmooth_2.23-18 DBI_1.1.0          lazyeval_0.2.2    
## [13] colorspace_2.0-0   withr_2.3.0        tidyselect_1.1.0   gridExtra_2.3     
## [17] curl_4.3           compiler_4.0.3     cli_2.2.0          rvest_0.3.6       
## [21] xml2_1.3.2         labeling_0.4.2     caTools_1.18.0     scales_1.1.1      
## [25] digest_0.6.27      minqa_1.2.4        foreign_0.8-80     rmarkdown_2.5     
## [29] rio_0.5.16         pkgconfig_2.0.3    htmltools_0.5.0    dbplyr_2.0.0      
## [33] htmlwidgets_1.5.2  rlang_0.4.8        readxl_1.3.1       rstudioapi_0.13   
## [37] generics_0.1.0     farver_2.0.3       jsonlite_1.7.1     gtools_3.8.2      
## [41] crosstalk_1.1.0.1  zip_2.1.1          magrittr_2.0.1     Formula_1.2-4     
## [45] Rcpp_1.0.5         munsell_0.5.0      fansi_0.4.2        abind_1.4-5       
## [49] lifecycle_0.2.0    stringi_1.5.3      yaml_2.2.1         gbRd_0.4-11       
## [53] MASS_7.3-53        plyr_1.8.6         bdsmatrix_1.3-4    crayon_1.3.4      
## [57] lattice_0.20-41    haven_2.3.1        splines_4.0.3      pander_0.6.3      
## [61] jtools_2.1.2       hms_0.5.3          knitr_1.30         pillar_1.4.7      
## [65] boot_1.3-25        reprex_1.0.0       glue_1.4.2         evaluate_0.14     
## [69] data.table_1.13.2  modelr_0.1.8       Rdpack_2.1         nloptr_1.2.2.2    
## [73] vctrs_0.3.5        miscTools_0.6-26   cellranger_1.1.0   gtable_0.3.0      
## [77] reshape_0.8.8      assertthat_0.2.1   xfun_0.19          openxlsx_4.2.3    
## [81] rbibutils_2.0      broom_0.7.2        viridisLite_0.3.0  maxLik_1.4-6      
## [85] statmod_1.4.35     ellipsis_0.3.1